Goto

Collaborating Authors

 functional element


KeySG: Hierarchical Keyframe-Based 3D Scene Graphs

Werby, Abdelrhman, Rotondi, Dennis, Scaparro, Fabio, Arras, Kai O.

arXiv.org Artificial Intelligence

In recent years, 3D scene graphs have emerged as a powerful world representation, offering both geometric accuracy and semantic richness. Combining 3D scene graphs with large language models enables robots to reason, plan, and navigate in complex human-centered environments. However, current approaches for constructing 3D scene graphs are semantically limited to a predefined set of relationships, and their serialization in large environments can easily exceed an LLM's context window. We introduce KeySG, a framework that represents 3D scenes as a hierarchical graph consisting of floors, rooms, objects, and functional elements, where nodes are augmented with multi-modal information extracted from keyframes selected to optimize geometric and visual coverage. The keyframes allow us to efficiently leverage VLM to extract scene information, alleviating the need to explicitly model relationship edges between objects, enabling more general, task-agnostic reasoning and planning. Our approach can process complex and ambiguous queries while mitigating the scalability issues associated with large scene graphs by utilizing a hierarchical retrieval-augmented generation (RAG) pipeline to extract relevant context from the graph. Evaluated across four distinct benchmarks -- including 3D object segmentation and complex query retrieval -- KeySG outperforms prior approaches on most metrics, demonstrating its superior semantic richness and efficiency.


FunGraph: Functionality Aware 3D Scene Graphs for Language-Prompted Scene Interaction

Rotondi, Dennis, Scaparro, Fabio, Blum, Hermann, Arras, Kai O.

arXiv.org Artificial Intelligence

The concept of 3D scene graphs is increasingly recognized as a powerful semantic and hierarchical representation of the environment. Current approaches often address this at a coarse, object-level resolution. In contrast, our goal is to develop a representation that enables robots to directly interact with their environment by identifying both the location of functional interactive elements and how these can be used. To achieve this, we focus on detecting and storing objects at a finer resolution, focusing on affordance-relevant parts. The primary challenge lies in the scarcity of data that extends beyond instance-level detection and the inherent difficulty of capturing detailed object features using robotic sensors. We leverage currently available 3D resources to generate 2D data and train a detector, which is then used to augment the standard 3D scene graph generation pipeline. Through our experiments, we demonstrate that our approach achieves functional element segmentation comparable to state-of-the-art 3D models and that our augmentation enables task-driven affordance grounding with higher accuracy than the current solutions.


SpotLight: Robotic Scene Understanding through Interaction and Affordance Detection

Engelbracht, Tim, Zurbrügg, René, Pollefeys, Marc, Blum, Hermann, Bauer, Zuria

arXiv.org Artificial Intelligence

Despite increasing research efforts on household robotics, robots intended for deployment in domestic settings still struggle with more complex tasks such as interacting with functional elements like drawers or light switches, largely due to limited task-specific understanding and interaction capabilities. These tasks require not only detection and pose estimation but also an understanding of the affordances these elements provide. To address these challenges and enhance robotic scene understanding, we introduce SpotLight: A comprehensive framework for robotic interaction with functional elements, specifically light switches. Furthermore, this framework enables robots to improve their environmental understanding through interaction. Leveraging VLM-based affordance prediction to estimate motion primitives for light switch interaction, we achieve up to 84% operation success in real world experiments. We further introduce a specialized dataset containing 715 images as well as a custom detection model for light switch detection. We demonstrate how the framework can facilitate robot learning through physical interaction by having the robot explore the environment and discover previously unknown relationships in a scene graph representation. Lastly, we propose an extension to the framework to accommodate other functional interactions such as swing doors, showcasing its flexibility. Videos and Code: timengelbracht.github.io/SpotLight/


Assessing the potential of AI-assisted pragmatic annotation: The case of apologies

Yu, Danni, Li, Luyang, Su, Hang, Fuoli, Matteo

arXiv.org Artificial Intelligence

Certain forms of linguistic annotation, like part of speech and semantic tagging, can be automated with high accuracy. However, manual annotation is still necessary for complex pragmatic and discursive features that lack a direct mapping to lexical forms. This manual process is time-consuming and error-prone, limiting the scalability of function-to-form approaches in corpus linguistics. To address this, our study explores automating pragma-discursive corpus annotation using large language models (LLMs). We compare ChatGPT, the Bing chatbot, and a human coder in annotating apology components in English based on the local grammar framework. We find that the Bing chatbot outperformed ChatGPT, with accuracy approaching that of a human coder. These results suggest that AI can be successfully deployed to aid pragma-discursive corpus annotation, making the process more efficient and scalable. Keywords: linguistic annotation, function-to-form approaches, large language models, local grammar analysis, Bing chatbot, ChatGPT


Daily Digest December 20, 2019 – BioDecoded

#artificialintelligence

Digitized patient charts were supposed to revolutionize medical practice. Artificial intelligence could help unlock their potential. Identification of functional elements for a protein of interest is important for achieving a mechanistic understanding. Here, researchers report a strategy, PArsing fragmented DNA Sequences from CRISPR Tiling MUtagenesis Screening (PASTMUS), which provides a streamlined workflow and a bioinformatics pipeline to identify critical amino acids of proteins in their native biological contexts. Determining how chromosomes are positioned and folded within the nucleus is critical to understanding the role of chromatin topology in gene regulation.


Compbio.mit.edu - MIT Computational Biology Group - Kellis Lab at MIT and Broad Institute

#artificialintelligence

Variation and Disease: Translating genetic findings into therapeutics remains an unsolved challenge, partly because in 93% of cases, disease-associated common variants do not disrupt proteins directly, but instead alter their genomic control elements. Our group develops and uses epigenomic maps of regulatory elements, and cellular circuits linking them to their regulators and target genes, in order to understand how human genetic variation contributes to disease and cancer. We have developed resources and methods for studying how genetic variation impacts gene expression, regualtory region activity, cellular phenotypes, and ultimately human disease. We have applied these methods to obesity, Alzheimer's disease, cardiovascular traits, psychiatric disorders, and cancer, resulting in multiple insights. In addition to dissecting these circuits, we have used gene manilations and genome editing to reverse the phenotypic signatures of disease from risk and non-risk individuals, paving the way for genomics-based therapeutics.